Protein Sequence Pattern Mining with Constraints
نویسندگان
چکیده
Considering the characteristics of biological sequence databases, which typically have a small alphabet, a very long length and a relative small size (several hundreds of sequences), we propose a new sequence mining algorithm (gIL). gIL was developed for linear sequence pattern mining and results from the combination of some of the most efficient techniques used in sequence and itemset mining. The algorithm exhibits a high adaptability, yielding a smooth and direct introduction of various types of features into the mining process, namely the extraction of rigid and arbitrary gap patterns. Both breadth or a depth first traversal are possible. The experimental evaluation, in synthetic and real life protein databases, has shown that our algorithm has superior performance to state-of-the art algorithms. The use of constraints has also proved to be a very useful tool to specify user interesting patterns.
منابع مشابه
NOSEP: Nonoverlapping Sequence Pattern Mining With Gap Constraints.
Sequence pattern mining aims to discover frequent subsequences as patterns in a single sequence or a sequence database. By combining gap constraints (or flexible wildcards), users can specify special characteristics of the patterns and discover meaningful subsequences suitable for their own application domains, such as finding gene transcription sites from DNA sequences or discovering patterns ...
متن کاملA Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database
Constraint-based pattern discovery is at the core of numerous data mining tasks. Patterns are extracted with respect to a given set of constraints (frequency, closedness, size, etc). In the context of sequential pattern mining, a large number of devoted techniques have been developed for solving particular classes of constraints. The aim of this paper is to investigate the use of Constraint Pro...
متن کاملcSPADE -UE: Algorithm for Sequence Mining for Unstructured Elements Using Time Gap Constraints
-We present a new state machine that combines two techniques for complex data sequences: Data modeling and frequent sequence mining. This algorithm relies on unstructured variable gap sequence miner, to mine frequent patterns with different gap between elements. Here we will have two variations: Sequence pruning technique for other primary frequent sequences to reduce space complexity and allow...
متن کاملGeneralized Sequential Pattern Mining with Item Intervals
Sequential pattern mining is an important data mining method with broad applications that can extract frequent sequences while maintaining their order. However, it is important to identify item intervals of sequential patterns extracted by sequential pattern mining. For example, a sequence < A, B > with a 1-day interval and a sequence < A, B > with a 1-year interval are completely different; th...
متن کاملApproaches for Pattern Discovery Using Sequential Data Mining
In this chapter we first introduce sequence data. We then discuss different approaches for mining of patterns from sequence data, studied in literature. Apriori based methods and the pattern growth methods are the earliest and the most influential methods for sequential pattern mining. There is also a vertical format based method which works on a dual representation of the sequence database. Wo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005